A Daptive P Ath - I Ntegral a Pproach for R Epre - Sentation L Earning and P Lanning

نویسندگان

Jung-Su Ha

Young-Jin Park

Hyeok-Joo Chae

Soon-Seo Park

Han-Lim Choi

چکیده

We present a novel framework for representation learning that builds a lowdimensional latent dynamical model from high-dimensional sequential raw data, e.g., video. The framework builds upon recent advances in the amortized inference that constructs a fully-differentiable network, and takes advantage of the duality between control and inference to solve the intractable inference problem using the path integral control approach. We also present the efficient planning method that exploits the learned low-dimensional latent dynamics. 1 APPROXIMATE INFERENCE VIA STOCHASTIC OPTIMAL CONTROL In an approximate inference, it is known that a tighter evidence lower bound (ELBO) can be achieved by using multiple samples, z, independently sampled from the proposal distribution q(z): log p(x) ≥ L ≡ Ez1:L∼q(·) [ log 1 L L ∑ l=1 p(x, z) q(zl) ] ≥ LL−1. (1) It is proven that the ELBO gets tighter as L increases (Burda et al., 2016; Cremer et al., 2017). This multi-sample objective, L, is referred as Monte Carlo objectives (MCO) in the sense that it utilizes independent samples to estimate the marginal likelihood (Mnih & Rezende, 2016), Î(z) = 1 L L ∑ l=1 p(x, z) q(zl) = 1 L L ∑ l=1 p(x|z)p(z) q(zl) , z ∼ q(·), ∀l ∈ {1, ..., L}. (2) The performance of the MCO-based learning algorithm crucially depends on the variance of Î(z), which can be reduced by decreasing the gap between the proposal distribution, q(z), and the true posterior distribution, p(z|x); when q(z) = p(z|x), the variance reduces to 0. This work particularly considers a continuous-time latent state trajectory z[0,T ] of which probability measure, p, is induced by (3) with u(t) = 0, ∀t ∈ [0, T ], and a sequential observation x1:K with p(x1:K |z[0,T ]) = ∏K k=1 p(xk|z(tk)), where {tk} is a sequence of discrete time points. Consider a continuous-time stochastic dynamics with a state, z ∈ Rz , and a control, u ∈ Ru : dz(t) = f(z(t))dt+ σ(z(t))(u(t)dt+ dw(t)), z(0) ∼ p0(·), (3) where w(t) is a du-dimensional Wiener process, and let qu be the probability measure induced by the controlled trajectories. There is a class of stochastic optimal control problems of which objective function can be written as a KL divergence form by the Girsanov’s theorem: J = KL ( qu(z[0,T ])||p(z[0,T ]) ) − log ξ, (4) where p∗, represented as dp(z[0,T ]) = exp(−V (z[0,T ]))dp(z[0,T ])/ξ, is the probability measure induced by the optimally-controlled trajectories with a state cost function V (z[0,T ]) ≡ ∫ T 0 V (z(t))dt, and ξ ≡ ∫ exp(−V (z[0,T ]))dp(z[0,T ]) is a normalization constant (see the Appendix A for details). By applying the Girsanov’s theorem again, the optimal trajectory distribution is expressed as: dp(z[0,T ]) ∝ dqu(z[0,T ]) exp ( −Su(z[0,T ]) ) , (5) Su(z[0,T ]) = V (z[0,T ]) + 1 2 ∫ T

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AI Planning Versus Manufacturing-Operation Planning: A Case Study

A l though AI p lann ing techniques can potent ia l l y be useful in several manufac tur ing domains, this potent ia l remains largely unrealized. In order to adapt AI p lanning techniques to manufac tu r ing , i t is impo r tan t to develop more realistic and robust ways to address issues impo r t an t to manufac tur ing engineers. Furthermore, by invest igat ing such issues, AI researchers ma...

متن کامل

Now or Never?

A N explora t ion was made, o n the in i t i a t ive of Professor P C Mahalanobis . F R S, of possible appl icat ions of geographical techniques to problems of nat ional p lanning; in I n d i a . In pa r t i cu l a r a p i l o t project in regional survey fo r p lanning, purposes was car r ied out in Mysore State, w i th the col labora t ion of I n d i a n geographers, statisticians, and econom...

متن کامل

Amitraz Poisoning; A case study

A m i t r a z, a n i ns e c t i c i d e /a ca ri c i de of the f o r m a m i d i n e p e st i c i d e s group, is a ? 2 a d r e n e r g i c ag on i st a nd of t he a m i d i ne c h e m i ca l f a m il y generally us e d to c o n t r ol animal e c top a r a s i t e s. Poisoning due to am i t r a z i s r a r e and character...

متن کامل

Wall Shear Stress Measurement Error in the Common Carotid Artery: A Dual Modality Study

A. J. Barker, F. Zhang, P. E. Gates, L. A. Mazzaro, J. Fulford, C. J. Lanning, and R. Shandas Mechanical Engineering, University of Colorado, Boulder, CO, United States, Peninsula Medical School, University of Exeter, United Kingdom, Division of Cardiology, The Children's Hospital, Aurora, CO, United States, Center for Bioengineering, University of Colorado at Denver Health Sciences, CO, United...

متن کامل

Terrain Navigation Through Knowledge-based Route Planning

The advent of advanced computer a r c h i t e c t u r e s fo r p a r a l l e l and symbolic processing has evolved to the po in t where the technology c u r r e n t l y e x i s t s fo r the development of p ro to type autonomous v e h i c l e s . Contro l o f such devices w i l l r equ i r e communication between knowledgebased subsystems in charge of the v i s i o n , p l ann ing , and c o n f...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

A Daptive P Ath - I Ntegral a Pproach for R Epre - Sentation L Earning and P Lanning

نویسندگان

چکیده

منابع مشابه

AI Planning Versus Manufacturing-Operation Planning: A Case Study

Now or Never?

Amitraz Poisoning; A case study

Wall Shear Stress Measurement Error in the Common Carotid Artery: A Dual Modality Study

Terrain Navigation Through Knowledge-based Route Planning

عنوان ژورنال:

اشتراک گذاری